JianNan ZHANG JiJun ZHOU JianFeng WU ShengYing YANG
Convolutional neural networks (CNNS) have a strong ability to understand and judge images. However, the enormous parameters and computation of CNNS have limited its application in resource-limited devices. In this letter, we used the idea of parameter sharing and dense connection to compress the parameters in the convolution kernel channel direction, thus greatly reducing the number of model parameters. On this basis, we designed Shared and Dense Channel-wise Convolutional Networks (SDChannelNets), mainly composed of Depth-wise Separable SD-Channel-wise Convolution layer. The advantage of SDChannelNets is that the number of model parameters is greatly reduced without or with little loss of accuracy. We also introduced a hyperparameter that can effectively balance the number of parameters and the accuracy of a model. We evaluated the model proposed by us through two popular image recognition tasks (CIFAR-10 and CIFAR-100). The results showed that SDChannelNets had similar accuracy to other CNNs, but the number of parameters was greatly reduced.
Lianqiang LI Jie ZHU Ming-Ting SUN
Convolutional Neural Networks (CNNs) usually have millions or even billions of parameters, which make them hard to be deployed into mobile devices. In this work, we present a novel filter-level pruning method to alleviate this issue. More concretely, we first construct an undirected fully connected graph to represent a pre-trained CNN model. Then, we employ the spectral clustering algorithm to divide the graph into some subgraphs, which is equivalent to clustering the similar filters of the CNN into the same groups. After gaining the grouping relationships among the filters, we finally keep one filter for one group and retrain the pruned model. Compared with previous pruning methods that identify the redundant filters by heuristic ways, the proposed method can select the pruning candidates more reasonably and precisely. Experimental results also show that our proposed pruning method has significant improvements over the state-of-the-arts.
Kota ANDO Kodai UEYOSHI Yuka OBA Kazutoshi HIROSE Ryota UEMATSU Takumi KUDO Masayuki IKEBE Tetsuya ASAI Shinya TAKAMAEDA-YAMAZAKI Masato MOTOMURA
Deep neural network (NN) has been widely accepted for enabling various AI applications, however, the limitation of computational and memory resources is a major problem on mobile devices. Quantized NN with a reduced bit precision is an effective solution, which relaxes the resource requirements, but the accuracy degradation due to its numerical approximation is another problem. We propose a novel quantized NN model employing the “dithering” technique to improve the accuracy with the minimal additional hardware requirement at the view point of the hardware-algorithm co-designing. Dithering distributes the quantization error occurring at each pixel (neuron) spatially so that the total information loss of the plane would be minimized. The experiment we conducted using the software-based accuracy evaluation and FPGA-based hardware resource estimation proved the effectiveness and efficiency of the concept of an NN model with dithering.
Guodong SUN Zhen ZHOU Yuan GAO Yun XU Liang XU Song LIN
In this paper we design a fast fabric defect detection framework (Fast-DDF) based on gray histogram back-projection, which adopts end to end multi-convoluted network model to realize defect classification. First, the back-projection image is established through the gray histogram on fabric image, and the closing operation and adaptive threshold segmentation method are performed to screen the impurity information and extract the defect regions. Then, the defect images segmented by the Fast-DDF are marked and normalized into the multi-layer convolutional neural network for training. Finally, in order to solve the problem of difficult adjustment of network model parameters and long training time, some strategies such as batch normalization of samples and network fine tuning are proposed. The experimental results on the TILDA database show that our method can deal with various defect types of textile fabrics. The average detection accuracy with a higher rate of 96.12% in the database of five different defects, and the single image detection speed only needs 0.72s.
Ryo MASUMURA Taichi ASAMI Takanobu OBA Sumitaka SAKAUCHI Akinori ITO
This paper demonstrates latent word recurrent neural network language models (LW-RNN-LMs) for enhancing automatic speech recognition (ASR). LW-RNN-LMs are constructed so as to pick up advantages in both recurrent neural network language models (RNN-LMs) and latent word language models (LW-LMs). The RNN-LMs can capture long-range context information and offer strong performance, and the LW-LMs are robust for out-of-domain tasks based on the latent word space modeling. However, the RNN-LMs cannot explicitly capture hidden relationships behind observed words since a concept of a latent variable space is not present. In addition, the LW-LMs cannot take into account long-range relationships between latent words. Our idea is to combine RNN-LM and LW-LM so as to compensate individual disadvantages. The LW-RNN-LMs can support both a latent variable space modeling as well as LW-LMs and a long-range relationship modeling as well as RNN-LMs at the same time. From the viewpoint of RNN-LMs, LW-RNN-LM can be considered as a soft class RNN-LM with a vast latent variable space. In contrast, from the viewpoint of LW-LMs, LW-RNN-LM can be considered as an LW-LM that uses the RNN structure for latent variable modeling instead of an n-gram structure. This paper also details a parameter inference method and two kinds of implementation methods, an n-gram approximation and a Viterbi approximation, for introducing the LW-LM to ASR. Our experiments show effectiveness of LW-RNN-LMs on a perplexity evaluation for the Penn Treebank corpus and an ASR evaluation for Japanese spontaneous speech tasks.
Baojun ZHAO Boya ZHAO Linbo TANG Baoxian WANG
Towards involving the convolutional neural networks into the object detection field, many computer vision tasks have achieved favorable successes. In order to adapt targets with various scales, deep feature pyramid is widely used, since the traditional object detection methods detect different objects in Gaussian image pyramid. However, due to the mismatching between the anchors and the feature distributions of targets, the accurate detection for targets with various scales is still a challenge. Considering the differences between the theoretical receptive field and effective receptive field, we propose a novel anchor generation method, which takes the effective receptive field as the standard. The proposed method is evaluated on the PASCAL VOC dataset and shows the favorable results.
Xiao-Yi ZHAO Chao-Yi DONG Peng ZHOU Mei-Jia ZHU Jing-Wen REN Xiao-Yan CHEN
The paper employed an Alexnet, which is a deep learning framework, to automatically diagnose the damages of wind power generator blade surfaces. The original images of wind power generator blade surfaces were captured by machine visions of a 4-rotor UAV (unmanned aerial vehicle). Firstly, an 8-layer Alexnet, totally including 21 functional sub-layers, is constructed and parameterized. Secondly, the Alexnet was trained with 10000 images and then was tested by 6-turn 350 images. Finally, the statistic of network tests shows that the average accuracy of damage diagnosis by Alexnet is about 99.001%. We also trained and tested a traditional BP (Back Propagation) neural network, which have 20-neuron input layer, 5-neuron hidden layer, and 1-neuron output layer, with the same image data. The average accuracy of damage diagnosis of BP neural network is 19.424% lower than that of Alexnet. The point shows that it is feasible to apply the UAV image acquisition and the deep learning classifier to diagnose the damages of wind turbine blades in service automatically.
Motohiro TAKAGI Akito SAKURAI Masafumi HAGIWARA
Current image quality assessment (IQA) methods require the original images for evaluation. However, recently, IQA methods that use machine learning have been proposed. These methods learn the relationship between the distorted image and the image quality automatically. In this paper, we propose an IQA method based on deep learning that does not require a reference image. We show that a convolutional neural network with distortion prediction and fixed filters improves the IQA accuracy.
Menghan JIA Feiteng LI Zhijian CHEN Xiaoyan XIANG Xiaolang YAN
An R-peak detection method with a high noise tolerance is presented in this paper. This method utilizes a customized deep convolution neural network (DCNN) to extract morphological and temporal features from sliced electrocardiogram (ECG) signals. The proposed network adopts multiple parallel dilated convolution layers to analyze features from diverse fields of view. A sliding window slices the original ECG signals into segments, and then the network calculates one segment at a time and outputs every point's probability of belonging to the R-peak regions. After a binarization and a deburring operation, the occurrence time of the R-peaks can be located. Experimental results based on the MIT-BIH database show that the R-peak detection accuracies can be significantly improved under high intensity of the electrode motion artifact or muscle artifact noise, which reveals a higher performance than state-of-the-art methods.
JianFeng WU HuiBin QIN YongZhu HUA LiHuan SHAO Ji HU ShengYing YANG
This paper proposes a deep neural network (DNN) based framework to address the problem of vector quantization (VQ) for high-dimensional data. The main challenge of applying DNN to VQ is how to reduce the binary coding error of the auto-encoder when the distribution of the coding units is far from binary. To address this problem, three fine-tuning methods have been adopted: 1) adding Gaussian noise to the input of the coding layer, 2) forcing the output of the coding layer to be binary, 3) adding a non-binary penalty term to the loss function. These fine-tuning methods have been extensively evaluated on quantizing speech magnitude spectra. The results demonstrated that each of the methods is useful for improving the coding performance. When implemented for quantizing 968-dimensional speech spectra using only 18-bit, the DNN-based VQ framework achieved an averaged PESQ of about 2.09, which is far beyond the capability of conventional VQ methods.
Tuan Linh DANG Yukinobu HOSHINO
This paper presents a hybrid architecture for a neural network (NN) trained by a particle swarm optimization (PSO) algorithm. The NN is implemented on the hardware side while the PSO is executed by a processor on the software side. In addition, principal component analysis (PCA) is also applied to reduce correlated information. The PCA module is implemented in hardware by the SystemVerilog programming language to increase operating speed. Experimental results showed that the proposed architecture had been successfully implemented. In addition, the hardware-based NN trained by PSO (NN-PSO) program was faster than the software-based NN trained by the PSO program. The proposed NN-PSO with PCA also obtained better recognition rates than the NN-PSO without-PCA.
Se-Min LIM Jooyoung PARK Hyeong-Cheol OH
This study designs a low-cost portable device that functions as a coaching assistant system which can support table tennis practice. Although deep learning technology is a promising solution to realizing human activity recognition, we propose using cosine similarity in making inferences. Our experiments show that the cosine similarity based inference can be a good alternative to the deep learning based inference for the assistant system when resources are limited.
Peng CHEN Weijun LI Linjun SUN Xin NING Lina YU Liping ZHANG
Human gender recognition in the wild is a challenging task due to complex face variations, such as poses, lighting, occlusions, etc. In this letter, learnable Gabor convolutional network (LGCN), a new neural network computing framework for gender recognition was proposed. In LGCN, a learnable Gabor filter (LGF) is introduced and combined with the convolutional neural network (CNN). Specifically, the proposed framework is constructed by replacing some first layer convolutional kernels of a standard CNN with LGFs. Here, LGFs learn intrinsic parameters by using standard back propagation method, so that the values of those parameters are no longer fixed by experience as traditional methods, but can be modified by self-learning automatically. In addition, the performance of LGCN in gender recognition is further improved by applying a proposed feature combination strategy. The experimental results demonstrate that, compared to the standard CNNs with identical network architecture, our approach achieves better performance on three challenging public datasets without introducing any sacrifice in parameter size.
Yingwei FU Kele XU Haibo MI Qiuqiang KONG Dezhi WANG Huaimin WANG Tie HONG
Sound event detection is intended to identify the sound events in audio recordings, which has widespread applications in real life. Recently, convolutional recurrent neural network (CRNN) models have achieved state-of-the-art performance in this task due to their capabilities in learning the representative features. However, the CRNN models are of high complexities with millions of parameters to be trained, which limits their usage for the mobile and embedded devices with limited computation resource. Model distillation is effective to distill the knowledge of a complex model to a smaller one, which can be deployed on the devices with limited computational power. In this letter, we propose a novel multi model-based distillation approach for sound event detection by making use of the knowledge from models of multiple teachers which are complementary in detecting sound events. Extensive experimental results demonstrated that our approach achieves a compression ratio about 50 times. In addition, better performance is obtained for the sound event detection task.
To the best of our knowledge, there are a few researches on air-handwriting character-level writer identification only employing acceleration and angular velocity data. In this paper, we propose a deep learning approach to writer identification only using inertial sensor data of air-handwriting. In particular, we separate different representations of degree of freedom (DoF) of air-handwriting to extract local dependency and interrelationship in different CNNs separately. Experiments on a public dataset achieve an average good performance without any extra hand-designed feature extractions.
Naranchimeg BOLD Chao ZHANG Takuya AKASHI
In recent decade, many state-of-the-art algorithms on image classification as well as audio classification have achieved noticeable successes with the development of deep convolutional neural network (CNN). However, most of the works only exploit single type of training data. In this paper, we present a study on classifying bird species by exploiting the combination of both visual (images) and audio (sounds) data using CNN, which has been sparsely treated so far. Specifically, we propose CNN-based multimodal learning models in three types of fusion strategies (early, middle, late) to settle the issues of combining training data cross domains. The advantage of our proposed method lies on the fact that we can utilize CNN not only to extract features from image and audio data (spectrogram) but also to combine the features across modalities. In the experiment, we train and evaluate the network structure on a comprehensive CUB-200-2011 standard data set combing our originally collected audio data set with respect to the data species. We observe that a model which utilizes the combination of both data outperforms models trained with only an either type of data. We also show that transfer learning can significantly increase the classification performance.
Shunta NAKAGAWA Tatsuya NAGAI Hideaki KANEHARA Keisuke FURUMOTO Makoto TAKITA Yoshiaki SHIRAISHI Takeshi TAKAHASHI Masami MOHRI Yasuhiro TAKANO Masakatu MORII
System administrators and security officials of an organization need to deal with vulnerable IT assets, especially those with severe vulnerabilities, to minimize the risk of these vulnerabilities being exploited. The Common Vulnerability Scoring System (CVSS) can be used as a means to calculate the severity score of vulnerabilities, but it currently requires human operators to choose input values. A word-level Convolutional Neural Network (CNN) has been proposed to estimate the input parameters of CVSS and derive the severity score of vulnerability notes, but its accuracy needs to be improved further. In this paper, we propose a character-level CNN for estimating the severity scores. Experiments show that the proposed scheme outperforms conventional one in terms of accuracy and how errors occur.
Xinyu HE Lishuang LI Xingchen SONG Degen HUANG Fuji REN
Biomedical event extraction is an important and challenging task in Information Extraction, which plays a key role for medicine research and disease prevention. Most of the existing event detection methods are based on shallow machine learning methods which mainly rely on domain knowledge and elaborately designed features. Another challenge is that some crucial information as well as the interactions among words or arguments may be ignored since most works treat words and sentences equally. Therefore, we employ a Bidirectional Long Short Term Memory (BLSTM) neural network for event extraction, which can skip handcrafted complex feature extraction. Furthermore, we propose a multi-level attention mechanism, including word level attention which determines the importance of words in a sentence, and the sentence level attention which determines the importance of relevant arguments. Finally, we train dependency word embeddings and add sentence vectors to enrich semantic information. The experimental results show that our model achieves an F-score of 59.61% on the commonly used dataset (MLEE) of biomedical event extraction, which outperforms other state-of-the-art methods.
Weijun LU Chao GENG Dunshan YU
Forecasting commodity futures price is a challenging task. We present an algorithm to predict the trend of commodity futures price based on a type of structuring data and back propagation neural network. The random volatility of futures can be filtered out in the structuring data. Moreover, it is not restricted by the type of futures contract. Experiments show the algorithm can achieve 80% accuracy in predicting price trends.
Hochong PARK Yong-Shik SHIN Seong-Hyeon SHIN
Speech captured by an in-ear microphone placed inside an occluded ear has a high signal-to-noise ratio; however, it has different sound characteristics compared to normal speech captured through air conduction. In this study, a method for blind speech quality enhancement is proposed that can convert speech captured by an in-ear microphone to one that resembles normal speech. The proposed method estimates an input-dependent enhancement function by using a neural network in the feature domain and enhances the captured speech via time-domain filtering. Subjective and objective evaluations confirm that the speech enhanced using our proposed method sounds more similar to normal speech than that enhanced using conventional equalizer-based methods.